Maximum likelihood normalization for robust speech recognition
نویسندگان
چکیده
It is well-known that additive and channel noise cause shift and scaling in MFCC features. Empirical normalization techniques to estimate and compensate for the effects, such as cepstral mean subtraction and variance normalization, have been shown to be useful. However, these empirical estimate may not be optimal. In this paper, we approach the problem from two directions, 1) use a more robust MFCC-based features that is less sensitive to additive and channel noise and 2) propose a maximum likelihood (ML) based approach to compensate the noise effect. In addition, we proposed the use of multi-class normalization in which different normalization factors can be applied to different phonetic units. The combination of the robust features and ML normalization is particularly useful for highly mis-matched condition in the Aurora 3 corpus resulting in a 15.8% relative improvement in the highly mis-matched case and a 10.4% relative improvement on average over the three conditions.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملAn investigation of likelihood normalization for robust ASR
Noise-robust automatic speech recognition (ASR) systems rely on feature and/or model compensation. Existing compensation techniques typically operate on the features or on the parameters of the acoustic models themselves. By contrast, a number of normalization techniques have been defined in the field of speaker verification that operate on the resulting log-likelihood scores. In this paper, we...
متن کاملSpeaker normalization and pronunciation variant modeling: helpful methods for improving recognition of fast speech
The presented paper addresses the problem of creating hidden Markov models for fast speech. The major issues discussed are robust parameter estimation and reducing within-model variations. Regarding the first issue, the use of the maximum a posteriori parameter estimation is discussed. To reduce within-model variations, a maximum likelihood based vocal tract length normalization procedure and a...
متن کاملAging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech
The article describes the speech recognition system development in Bengali language for aging population with various adaptation techniques. Variability in acoustic characteristics among different speakers degrades speech recognition accuracy. In general, perceptual as well as acoustical variations exists among speakers, but variations are more pronounced between young and aged population. Devi...
متن کاملIrrelevant variability normalization based HMM training using VTS approximation of an explicit model of environmental distortions
In a traditional HMM compensation approach to robust speech recognition that uses Vector Taylor Series (VTS) approximation of an explicit model of environmental distortions, the set of generic HMMs are typically trained from “clean” speech only. In this paper, we present a maximum likelihood approach to training generic HMMs from both “clean” and “corrupted” speech based on the concept of irrel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003